Support for Parquet Modular Encryption for PyArrow FileIO.#3
Open
yashad-margaj wants to merge 2 commits intoprotegrity:mainfrom
Open
Support for Parquet Modular Encryption for PyArrow FileIO.#3yashad-margaj wants to merge 2 commits intoprotegrity:mainfrom
yashad-margaj wants to merge 2 commits intoprotegrity:mainfrom
Conversation
…edentials, and Unified GCP Credentials.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
The current version of iceberg-python doesn't have Parquet Modular Encryption (PME) support. As a result, the parquet files are written in clear. This change has addressed this limitation by adding PME support for PyArrow FileIO first.
Are these changes tested?
Yes. These changes are tested locally. PME is working as expected.
Are there any user-facing changes?
Yes.
While creating a PyIceberg catalog, user needs to add following property:
client.kms-vendor
Valid values are "aws", "azure", or "gcp".
Example:
catalog = load_catalog(name="default", **{"client.kms-vendor": "aws", "type": "sql", "uri": "uri", "warehouse": "warehouse"})
If client.kms-vendor is "aws", then user needs to add following properties too:
client.access-key-id
client.region
client.secret-access-key
If client.kms-vendor is "azure", then user needs to add following properties too:
client.client-id
client.client-secret
client.tenant-id
If client.kms-vendor is "gcp", then user needs to add following properties too:
client.oauth2-token
Complete AWS example:
catalog = load_catalog(name="default", **{"client.access-key-id": "client.access-key-id", "client.kms-vendor": "aws", "client.region": "client.region", "client.secret-access-key": "client.secret-access-key", "type": "sql", "uri": "uri", "warehouse": "warehouse"})
Similarly,
While creating a PyIceberg table, user needs to add following properties:
table.column-key
table.footer-key
table.keep-footer-in-plaintext
Valid values for table.keep-footer-in-plaintext are "yes" or "no".
Example:
table = catalog.create_table(identifier="default.table", properties={"table.column-key": {"table.column-key": ["column_name"]}, "table.footer-key": "table.footer-key", "table.keep-footer-in-plaintext": "no"})